home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Software Vault: The Diamond Collection
/
The Diamond Collection (Software Vault)(Digital Impact).ISO
/
cdr13
/
htmlco17.zip
/
HTMLCON.TXT
< prev
Wrap
Text File
|
1995-03-24
|
11KB
|
265 lines
HTMLCon Version 1.7 (April, 1995)
An HTM(L) to ASCii Document Converter
Satore Township
P.O. Box 750836
Petaluma, CA 94975-0836
WWW to ftp://ftp.crl.com/ftp/users/ro/mikekell/html/satore.htm
FTP to ftp.crl.com/ftp/users/ro/mikekell/ftp
This program may be distributed freely as long as no
modifications are made to it or this documentation. We
ask that you register this program if you find it useful.
The registration fee of $7.00 (U.S., by check) should be
mailed to Satore Township at the address given above. If
you register this program and provide us with your e-mail
address, we will provide you with the command to eliminate
the registration request screen which appears when the
program is initiated.
E-mail to mikekell@crl.com for comments or suggestions.
About the Program
-----------------
HTMLCon converts HTML (extension .HTM) documents to standard ASCII
(extension .ASC) files for viewing or printing. HTMLCon operates
under MSDOS or under any program capable of providing an MSDOS session
and using COMMAND.COM as a command interpreter. After processing the
input document (.HTM), output will be displayed on a viewer or editor
of your choice as defined in the control file (see below).
HTMLCon recognizes HTML symbology through HTML+ level as of this date.
It will automatically detect HTML files created in either an MSDOS or
UNIX environment and process them correctly. HTMLCon will attempt to
process the raw HTML file such that the output is as readable as
possible, eliminating unfavorable formatting to every extent practical.
Output may then be viewed, edited or printed as desired.
What's New in Version 1.4
-------------------------
This version allows the user to specify "find" and "replace" strings
to modify ouput from the HTM file to the ASCii file. The user may
define up to 50 such strings, each with a length of up to 40 characters.
"FIND" strings in the HTM file will be converted to "REPLACE" strings
in the output file. These find/replace components only take place after
HTMLCon has done its primary conversion and therefore allow the user to
make further refinements to the final output document.
What's New in Version 1.6
-------------------------
Three new structures in the INI file have been added:
keepformatting=yes, will preserve the general structure of the original
HTML file and will not attempt any formatting whatsoever. The intent of
this option is to allow users to strip only HTML constructs while
preserving the author's original formatting (for better or worse).
ignoresymbols=yes, will tell HTMLCon not to insert it's own symbols in
the place of certain HTML constructs. This works in conjunction with
the "keepformatting=yes" option above to preserve as much of the
original HTML construction as possible while still eliminating
unnecessary HTML constructs.
keephref=yes, will preserve all <A HREF, <A NAME, etc. constructs
when converting the HTML file.
Effective with version 1.53 two additional modifications have been made:
1. The user may specify any input file name, with any file extension.
If no file extension is specified, HTMLCon will assume the extension
".HTM" is indicated.
2. The intermediate file (WORKING.HTM) is now deleted after HTMLCon
completes processing.
Effective with version 1.54 the following enhancements have been made:
1. HTMLCon now allows users to specify output filenames on the command
line or in the interactive mode. If an output filename is not
specified, it will default to the base filename of the input file plus
the extension ".ASC".
2. It is possible to use two forms of command line arguments:
A. HTMLCon input_filename line_length output_filename, or
B. HTMLCon input_filename output_filename.
If option "B" above is used, the default line_length from the
HTMLCon.INI file will be used or, if not stated, the program default
line_length of 65 characters.
In addition, a number of bug fixes were included (such as proper
interpretation of <META and <!-- constructs, and others).
See the HTMLCon control file below for details.
A new command line option has been added to support these two new INI
commands. If you invoke HTMLCon as "HTMLCon HTML_filename CLEAN",
HTMLCon will assume both "keepformatting=yes" and "ignoresymbols=yes"
for the file in use, regardless of these statements in the INI file.
What's New in Version 1.7
-------------------------
HTMLCon now has the ability to process multiple input files. When used
in this mode HTMLCon will automatically assign the file extension '.ASC'
to all outputfiles. HTMLCon will automatically detect the multiple file
input mode by the presence of a '*' or '?' in the input file name.
For example, suppose that HTMLCon resides in the directory "C:\HTMLCON"
and that there are several HTM/HTML files in the directory "C:\HTMLWRIT"
that you wish to process. First, move to the "C:\HTMLCON" directory,
then issue the command "HTMLCON C:\HTMLWRIT\*.HTM". HTMLCon will
process the files, one-by-one, asking you each time if you wish to
proceed with processing the next file. When asked if you wish to
proceed, you will be given the following options: Y)es (the default), N)o
(no to this file only), Q)uit (quit processing all files), or A)ll
(process all of the remaining files without pausing).
In addition, while in the multiple file mode, HTMLCon will create a
batch file (HTMLCONM.BAT) in the default directory. This batch file may
be run by the user to again process the same multiple files indicated on
the command line or in reponse to the input file name prompt given by
HTMLCon.
Installation
------------
Copy HTMLCON.EXE and HTMLCON.INI to a new directory of your choice.
The program is now ready to run. Source files (.HTM) should be
placed in this directory for processing. Outputfiles (.ASC) will
be created in this directory.
Operation
---------
HTMLCon can be operated in the interactive mode by running "HTMLCon"
from the MSDOS session. It can also be run without operator
intervention by using the following command line arguments:
HTMLCon input_file[.HTM] line_length output_file[.ASC], or
HTMLCon input_file[.HTM] output_file[.ASC]
where "line_length" indicates where HTMLCon should try to break a line
for the output file, using values between 40 and 200 characters per
line. Preferences can be stated in HTMLCON.INI as shown below. The
default file extensions can be overridden on the command line for both
input and output files.
Images found in the HTM file are output as [IMAGE], HREF references as
[*]. Forms are properly noted and marked, as is preformatted text and
other special HTML symbols. Derivatives are ignored except when the
text is preformatted.
Since the HTM Language is evolving continuously, it is possible that
HTMLCon may not recognize certain symbols properly. Also, since there
is great variation in the creation of HTML documents, it may not be
possible to ideally format all output. Problems with the output will be
corrected in future versions and we ask that you let us know of any
problems by sending us e-mail, including the original HTML document that
is not being processed correctly.
HTMLCon Control File
--------------------
The control file should be named HTMLCON.INI and exist in the same
directory as HTMLCon. Here is a sample, with explanations, of the
control file:
# HTMLCon Initialization File (current through version 1.5x)
#
# Lines beginning with a pound sign are considered comments.
# All other lines are considered instructions and must exactly follow
# the format described in this sample file. Arguments are seperated
# by an equal sign (=) which must not be preceeded or succeeded by
# a space or tab.
#
# Define the default point at which HTMLCon should attempt to break a
# line for the output file. The break is not guaranteed to occur at
# this point, but as close to it as possible to retain the syntax of
# the input line. Default=65.
#
linebreak=70
#
# Statistics can be compiled and written to the output file. Default=No.
# statistics=no
#
statistics=yes
#
# You may launch another program after HTMLCon finishes its work. This
# may be an ASCII file viewer, editor, or whatever. The launched program
# must be able to take the output file name as an argument. In order to
# accomplish this you must provide the FULL PATH to your program.
#
launchprog=c:\utils\list.com
#
# Find and replace: you may specify up to 50 strings to be located in
# the HTM file and replaced in the ASCII output file. These will be a
# direct replacement using the two commands "find=" and "replace=". Each
# "find" element will be replaced by a "replace" element, therefore you
# cannot have a "find=" statement without a following "replace=" statement.
# To specify leading or ending spaces in a statement, surround the statement
# with quotations ("). The strings cannot exceed 40 characters each.
#
find=" -- "
replace=--
#
# Here is an example replacing all reference symbols [*] with just *.
#find=[*]
#replace=*
#
# And replace all image symbols [IMAGE] with a shorter one.
#find=[IMAGE]
#replace=[I]
#
# And replace all HTMLCon list/tab markers with three spaces.
find=->
replace=" "
#
#
# You may elect to keep the formatting characteristics of the original
# HTML file intact. This will preserve white spaces, line breaks, etc. as
# originally constructed by the author of the HTML page. This option
# will also eliminate the HTMLCon tab markers (->) and replace them with
# four spaces to indicate tab lists. Uncomment the following line to
# preserve the original formatting:
#
# keepformatting=yes
#
#
# You may choose to have HTMLCon not replace certain HTML constructs
# with its own markers (for example, HTMLCon replaces image references
# with the symbol [*]). To have HTMLCon simply ignore its own symbols and
# not reference certain items in the original HTML file, uncomment the
# next line:
#
# ignoresymbols=yes
#
# You may instruct HTMLCon to preserve all <A HREF...> constructs when
# converting the HTML file. These references will be preserved intact,
# without modification. To use this feature, uncomment the next line:
#
# keephref=yes
#
#
# Eliminate the advertisements and delays
# [available to registered users only]
#
#
# End of file